Speaker Recognition Using Gaussian Mixtures Models

نویسندگان

  • Eric Simancas-Acevedo
  • Akira Kurematsu
  • Mariko Nakano-Miyatake
  • Héctor M. Pérez Meana
چکیده

Speech signal contains several levels of information. At first it contains information about the spoken message. At second level speech signal also gives information about the speaker identity, his emotional state and so on. The task of speaker recognition can be divided into two parts: speaker identification and speaker verification. Speaker identification is answering the question which one of the group of known voices best matches the input voice. Speaker verification is answering the question is really this person who claims to be. Also speaker recognition can be text dependent or text independent. In text dependent speaker recognition, speech recognition is performed too and there are used the same methods as in speech recognition. In speech and speaker recognition systems various features are used [1], calculated from the short intervals (named as frames) of the speech signal: coefficients of Linear prediction coding (LPC), cepstral coefficients, calculated from LPC model (LPCC), mel-cepstrum coefficients (MFCC), bark cepstrum coefficients, delta cepstrum and so on. Duration of the frame is about 25ms. These frames overlap one another. The same features are often used in speech and speaker recognition systems, however there are two completely different tasks. There are proposed a lot of methods for speaker modelling and recognition. In text dependent speaker recognition the most popular methods are dynamic time warping (DTW), Hidden Markov Models (HMM) [2]. In text independent speaker recognition the most popular methods are: Vector Quantization (VQ) [3], fully connected (ergodic) HMM‘s, artificial neural networks (ANN) [4], support vector machines (SVM) [5], and Gaussian Mixture Models (GMM) [6]. In this paper we would like to propose text independent speaker recognition method with new feature vectors, that consist of fundamental frequency and four formant frequencies, try to build Gaussian Mixture speaker models. Vector Quantization method was employed for initial parameters estimation of speakers GMM. Experiments of speaker recognition were performed and compared with experiments using Gaussian Mixture Models with mel – frequency cepstral coefficients, that is baseline in speaker recognition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Principal mixture speaker adaptation for improved continuous speech recognition

Nowadays, almost all speaker-independent (SI) speech recognition systems use CDHMM with multivariate mixture Gaussian as observation density to cover speaker variabilities. It has been shown that given sufficient training data, the more mixtures are used in the HMM observation density, the better the system’s perform. However, acoustic HMM with more Gaussian densities is more complex and slows ...

متن کامل

New background modeling for speaker verification

A new background speaker modelling method is presented in this paper for text-independent speaker verification using Gaussian mixture models. This method does not require speech databases of other speakers to build background speaker models. A background model can be built directly from the same claimed speaker's database and has a smaller number of Gaussian mixtures compared to the claimed spe...

متن کامل

Comparison between supervised and unsupervised learning of probabilistic linear discriminant analysis mixture models for speaker verification

We present a comparison of speaker verification systems based on unsupervised and supervised mixtures of probabilistic linear discriminant analysis (PLDA) models. This paper explores current applicability of unsupervised mixtures of PLDA models with Gaussian priors in a total variability space for speaker verification. Moreover, we analyze the experimental conditions under which this applicatio...

متن کامل

A Bayesian Approach to Speaker Recognition Based on GMMs Using Multiple Model Structures

This paper proposes a speaker recognition technique using multiple model structures based on the Bayesian approach. In recent speaker recognition, many sophisticated statistical models have been proposed, e.g., Joint Factor Analysis and i-Vector based method. However, since most of them are based on Gaussian Mixture Models (GMMs), therefore improving estimation accuracy of generative models, i....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001